Scientific Workflow Provenance Metadata Management Using an RDBMS

نویسندگان

  • Artem Chebotko
  • Xubo Fei
  • Cui Lin
  • Shiyong Lu
  • Farshad Fotouhi
چکیده

Provenance management has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. This paper proposes an approach to provenance management that seamlessly integrates the interoperability, extensibility, and reasoning advantages of Semantic Web technologies with the storage and querying power of an RDBMS. Specifically, we propose: i) two schema mapping algorithms to map an arbitrary OWL provenance ontology to a relational database schema that is optimized for common provenance queries; ii) two efficient data mapping algorithms to map provenance RDF metadata to relational data according to the generated relational database schema, and iii) a schemaindependent SPARQL-to-SQL translation algorithm that is optimized on-the-fly by using the type information of an instance available from the input provenance ontology and the statistics of the sizes of the tables in the database. In addition, we extend SPARQL with negation, aggregation, and set operations to support some important provenance queries. Experimental results are presented to show that our algorithms are efficient and scalable.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scientific Workflow Provenance Metadata Management Using an RDBMS-based RDF Store

Provenance management has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. This paper proposes an approach to provenance management that seamlessly integrates the interoperability, extensibility, and reasoning advantages of Semantic Web technologies with the storage and querying power...

متن کامل

RDFProv: A relational RDF store for querying and managing scientific workflow provenance

Article history: Received 12 October 2008 Received in revised form 8 March 2010 Accepted 11 March 2010 Available online 23 March 2010 Provenance metadata has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. The provenance management problem concerns the efficiency and effectiveness of...

متن کامل

Managing the Deluge of Scientific Data

Provenance information in eScience is metadata that's critical to effectively manage the exponentially increasing volumes of scientific data from industrial-scale experiment protocols. Semantic provenance, based on domain-specific provenance ontologies, lets software applications unambiguously interpret data in the correct context. The semantic provenance framework for eScience data comprises e...

متن کامل

Storing, reasoning, and querying OPM-compliant scientific workflow provenance using relational databases

Provenance, the metadata that records the derivation history of scientific results, is essential in scientific workflows to support the reproducibility of scientific discovery, result interpretation, and problem diagnosis. To promote and facilitate interoperability among heterogeneous provenance systems, the Open Provenance Model (OPM) was first proposed in 2008 and since then has played an imp...

متن کامل

Declarative Model Discovery in Provenance Data for Aiding in Scientific Experiment Planning

Data provenance manages a collection of metadata cataloging origin and history of data. In scientific workflows, this metadata supports scientific experiment planning. However, the amount of provenance data generated from scientific workflow executions can grow through time, becoming infeasible evaluate them manually. Thus, mechanisms for automatically extracting and presenting knowledge from p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007